품질 / 상태
14
면적
5
욕실
4
연도
3
OverallQual
OverallCond
RoofStyle
ExterQual
ExterCond
Exterior1st
HeatingQC
GarageCond
BsmtCond
BsmtQual
KitchenQual
GarageQual
Foundation
PavedDrive
GrLivArea
TotalBsmtSF
GarageArea
WoodDeckSF
TotRmsAbvGrd
FullBath
HalfBath
BsmtFullBath
BsmtHalfBath
YearBuilt
YearRemodAdd
GarageYrBlt
카테고리형 결측치: None값 처리
ex) 지하실/차고가 없는 경우 “None” 처리
수치형 결측치: 0으로 처리
ex) 차고 없으면 0 채우기
범주형 변수 수치화
ex) ‘ExterQual’: {‘None’: 0, ‘Po’: 1, ‘Fa’: 2, ‘TA’: 3, ‘Gd’: 4, ‘Ex’: 5}
dummy코딩
RoofStyle’, ‘Exterior1st’, ‘Foundation’, ‘PavedDrive’
Text(0.5, 0, 'Coefficient Value')
Text(0.5, 1.0, 'LassoCV - Selected Feature Coefficients')
Text(0.5, 0, 'Alpha')
Text(0, 0.5, 'Number of Selected Features')
Text(0.5, 1.0, 'Change in the Number of Selected Variables by Alpha Value')
model 1 : [‘GrLivArea’, ‘OverallQual’, ‘TotalBsmtSF’, ‘GarageArea’, ‘YearBuilt’, ‘OverallCond’, ‘BsmtFullBath’, ‘ExterQual’, ‘BsmtQual’, ‘KitchenQual’]
model 2 : selected_features = [‘OverallQual’, ‘ExterQual’, ‘KitchenQual’ ,‘YearBuilt’, ‘BsmtFullBath’, ‘GrLivArea’, ‘TotalBsmtSF’ ,‘GarageArea’ ,‘WoodDeckSF’,‘RoofStyle_Hip’ ]
| Dep. Variable: | SalePrice | R-squared: | 0.852 |
| Model: | OLS | Adj. R-squared: | 0.852 |
| Method: | Least Squares | F-statistic: | 1425. |
| Date: | Thu, 24 Apr 2025 | Prob (F-statistic): | 0.00 |
| Time: | 22:17:48 | Log-Likelihood: | -29016. |
| No. Observations: | 2482 | AIC: | 5.805e+04 |
| Df Residuals: | 2471 | BIC: | 5.812e+04 |
| Df Model: | 10 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
| Intercept | -4.544e+05 | 5e+04 | -9.084 | 0.000 | -5.52e+05 | -3.56e+05 |
| RoofStyle_Hip[T.True] | 1.002e+04 | 1574.907 | 6.360 | 0.000 | 6927.648 | 1.31e+04 |
| OverallQual | 1.278e+04 | 746.795 | 17.110 | 0.000 | 1.13e+04 | 1.42e+04 |
| ExterQual | 1.453e+04 | 1709.012 | 8.499 | 0.000 | 1.12e+04 | 1.79e+04 |
| KitchenQual | 1.083e+04 | 1347.954 | 8.031 | 0.000 | 8182.549 | 1.35e+04 |
| YearBuilt | 165.1962 | 26.669 | 6.194 | 0.000 | 112.901 | 217.492 |
| BsmtFullBath | 1.396e+04 | 1217.527 | 11.463 | 0.000 | 1.16e+04 | 1.63e+04 |
| GrLivArea | 56.4054 | 1.603 | 35.180 | 0.000 | 53.261 | 59.549 |
| TotalBsmtSF | 32.2841 | 1.837 | 17.572 | 0.000 | 28.681 | 35.887 |
| GarageArea | 33.9268 | 3.698 | 9.174 | 0.000 | 26.675 | 41.179 |
| WoodDeckSF | 25.1514 | 4.761 | 5.283 | 0.000 | 15.816 | 34.487 |
| Omnibus: | 866.326 | Durbin-Watson: | 1.993 |
| Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 10081.930 |
| Skew: | 1.309 | Prob(JB): | 0.00 |
| Kurtosis: | 12.520 | Cond. No. | 2.36e+05 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.36e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
| Dep. Variable: | SalePrice | R-squared: | 0.854 |
| Model: | OLS | Adj. R-squared: | 0.853 |
| Method: | Least Squares | F-statistic: | 1443. |
| Date: | Thu, 24 Apr 2025 | Prob (F-statistic): | 0.00 |
| Time: | 22:17:48 | Log-Likelihood: | -29002. |
| No. Observations: | 2482 | AIC: | 5.803e+04 |
| Df Residuals: | 2471 | BIC: | 5.809e+04 |
| Df Model: | 10 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
| Intercept | -7.068e+05 | 5.9e+04 | -11.970 | 0.000 | -8.23e+05 | -5.91e+05 |
| GrLivArea | 59.0755 | 1.590 | 37.151 | 0.000 | 55.957 | 62.194 |
| OverallQual | 1.182e+04 | 767.035 | 15.416 | 0.000 | 1.03e+04 | 1.33e+04 |
| TotalBsmtSF | 36.3921 | 1.886 | 19.298 | 0.000 | 32.694 | 40.090 |
| GarageArea | 36.0914 | 3.683 | 9.799 | 0.000 | 28.869 | 43.314 |
| YearBuilt | 279.1536 | 30.757 | 9.076 | 0.000 | 218.841 | 339.466 |
| OverallCond | 5718.6463 | 586.232 | 9.755 | 0.000 | 4569.090 | 6868.202 |
| BsmtFullBath | 1.467e+04 | 1204.983 | 12.177 | 0.000 | 1.23e+04 | 1.7e+04 |
| ExterQual | 1.48e+04 | 1702.489 | 8.691 | 0.000 | 1.15e+04 | 1.81e+04 |
| KitchenQual | 9208.5952 | 1354.144 | 6.800 | 0.000 | 6553.221 | 1.19e+04 |
| BsmtQual | 248.3778 | 999.315 | 0.249 | 0.804 | -1711.203 | 2207.958 |
| Omnibus: | 977.532 | Durbin-Watson: | 1.998 |
| Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 11176.054 |
| Skew: | 1.533 | Prob(JB): | 0.00 |
| Kurtosis: | 12.933 | Cond. No. | 2.80e+05 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.8e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
Text(0.5, 0, 'GrLivArea')
Text(0, 0.5, '')
Text(0.5, 0, 'OverallQual')
Text(0, 0.5, '')
Text(0.5, 0, 'TotalBsmtSF')
Text(0, 0.5, '')
Text(0.5, 0, 'GarageArea')
Text(0, 0.5, '')
Text(0.5, 0, 'YearBuilt')
Text(0, 0.5, '')
Text(0.5, 0, 'OverallCond')
Text(0, 0.5, '')
Text(0.5, 0, 'BsmtFullBath')
Text(0, 0.5, '')
Text(0.5, 0, 'ExterQual')
Text(0, 0.5, '')
Text(0.5, 0, 'BsmtQual')
Text(0, 0.5, '')
Text(0.5, 0, 'KitchenQual')
Text(0, 0.5, '')
이상치 제거 df[‘GrLivArea’] <= 3500 df[‘GarageArea’] <= 1200 df[‘TotalBsmtSF’] <= 2500
Text(0.5, 1.0, 'Distribution of Maintenance Score')
Text(0.5, 0, 'Score')
Text(0, 0.5, 'Count')
Text(0.5, 1.0, 'Sale Price vs Maintenance Score')
Text(0.5, 0, 'Maintenance Score')
Text(0, 0.5, 'Sale Price')
1
2
*등급 구간 별 분포**
*A등급 중 가격이 낮은 상위 10개 집**
*등급 별 분포 지도 시각화**
지역별 평균 MaintenanceScore
Text(0.5, 1.0, 'Neighborhood-wise Average Maintenance Score')
지역 별 가장 많이 분포된 유지보수 등급
등급 별 평균 SalePrice